Application-specific foreign-interface generation

Application-specific foreign-interface generation, John Reppy and Chunyan Song, October 2006.

A foreign interface (FI) mechanism to support interoperability with libraries written in other languages (especially C) is an important feature in most high-level language implementations. Such FI mechanisms provide a Foreign Function Interface (FFI) for the high-level language to call C functions and marshaling and unmarshaling mechanisms to support conversion between the high-level and C data representations. Often, systems provide tools to automate the generation of FIs, but these tools typically lock the user into a specific model of interoperability. It is our belief that the policy used to craft the mapping between the high-level language and C should be distinct from the underlying mechanism used to implement the mapping.

In this paper, we describe a FI generation tool, called FIG (for Foreign Interface Generator) that embodies a new approach to the problem of generating foreign interfaces for high-level languages. FIG takes as input raw C header files plus a declarative script that specifies the generation of the foreign interface from the header file. The script sets the policy for the translation, which allows the user to tailor the resulting FI to his or her application. We call this approach application-specific foreign-interface generation. The scripting language uses rewriting strategies as its execution model. The other major feature of the scripting language is a novel notion of composable typemaps that describe the mapping between high-level and low-level types.

FFIs are a perennial engineering problem, and it's very nice to see progress being made on automating what's automatable about building interfaces. Their interface specification language is built from two little DSLs. The first one is a language that for specifying how to map low level types to high level types, and the second one is a rewriting-based language for translating API functions, which makes use of the type mapping programs you defined earlier. The whole thing is quite pretty, and seems to read very well.

An interesting gimme for you stack-language fans: the DSL that Reppy and Song use to specify type mappings from low-level to high-level types is a combinator-based language that reads a bit like Forth or Postscript.

Python 3000 Status Update

Guido updates the vision. Syntax focus seems cancerous, leaving FP sandwiched between set literals and backtick syntax, under Miscellany:

  • reduce() is gone. This doesn't mean I don't like higher-order functions; it simply reflects that almost all code that uses reduce() becomes more readable when rewritten using a plain old for-loop. (Example.)
  • lambda, however, lives.

Guido says that example "gives reduce() a bad name" which leaves me wondering as to its relevance. The idea that FP = Miscellany makes me wonder why the more stunning Python success stories (Google) involve FP techniques. Elsewhere:

  • zip(), map(), filter() return iterables

So FP is not even a free-standing category yet. I sometimes wish PLT Spy would revive, or that some FP language would target Python intermediate code. The value of Python is often stated to be its libraries.

I also wonder if Python people might build intuition by watching REBOL (=Lisp) people at work. They seem to enjoy puzzles. The motivating notion would be "hey! that REBOL trick should be so easy in Python."

Python in Pardus Linux

Pardus Linux is a case study of functional Python. It's a Linux distribution built from semi-scratch, the main focii being package management and init subsystems - places where C and shell script make poor sense. A funded group has finally tackled these issues.

A package management software deals a lot with sets, lists, and dependency graphs....We have extensively used functional operators (map, filter, reduce) and list comprehensions, even metaclasses are used in a few places.

Someone nudge Guido. Scheme or Oz might have been the better choice, but give them credit. They admit frankly to social acceptance issues.

C++ Historical Sources Archive

Seeing as we just had a lively discussion of Stroustrup’s HOPL paper, it's more than appropriate to mention another great resource courtesy of Paul McJones: The C++ Historical Sources Archive.

Among the treasures are the source code of the Cfront releases, but there's much more there so go take a peek.

Derivatives of Regular Expressions

Derivatives of Regular Expressions, Janusz Brzozowski, Journal of the ACM 1964.

Kleene's regular expressions, which can be used for describing sequential circuits, were defined using three operators (union, concatenation and iterate) on sets of sequences. Word descriptions of problems can be more easily put in the regular expression language if the language is enriched by the inclusion of other logical operations. However, in the problem of converting the regular expression description to a state diagram, the existing methods either cannot handle expressions with additional operators, or are made quite complicated by the presence of such operators. In this paper the notion of a derivative of a regular expression is introduced and the properties of derivatives are discussed. This leads, in a very natural way, to the construction of a state diagram from a regular expression containing any number of logical operators.

This is one of my favorite papers. It describes a very cute algorithm for building deterministic finite automata directly from a regular expression. The key trick is the idea of a derivative of a regular expression with respect to a string, which is a non-obvious but fun idea.

Note: This is an ACM DL link; I couldn't find the paper freely available online. :(

Solving a Sudoku with one SQL-statement

Doing strange things with SQL is always fun...

This Sudoku solver makes use of Oracle 10g's MODEL clause, which seems quite hairy.

SQL isn't mentioned around here all that often, so I am glad we can at least remind ourselves from time to time of the most heavily used declarative language out there by posting SQL puzzles and hacks...

ECMAScript Edition 4 Reference Implementation

The first pre-release of the reference implementation of ECMAScript Edition 4 (a.k.a. JavaScript 2) is now available. We've created a new web site for information about the ECMAScript specification and reference implementation. You can download source and binary forms of the reference implementation.

As we've discussed before here on LtU, the reference implementation of ECMAScript is being written in Standard ML. This choice should have many benefits, including:

  • to make the specification more precise than previous pseudocode conventions
  • to give implementors an executable framework to test against
  • to provide an opportunity to find bugs in the spec early
  • to spark interest and spur feedback from the research and user communities
  • to provide fodder for interesting program analyses to prove properties of the language (like various notions of type soundness)
  • to use as a test-bed for interesting extensions to the language

This pre-release is just our first milestone, i.e., the first of many "early and often" releases. Neither the specification nor the reference implementation is complete, and this early implementation has plenty of bugs. We encourage anyone interested to browse the bug database and report additional bugs.

We're happy to hear your feedback, whether it's bug reports or comments here on LtU or on the es4-discuss mailing list.

A Functional Description of TeX's Formula Layout

A Functional Description of TeX's Formula Layout, Reinhold Heckmann and Reinhard Wilhelm, Journal of Functional Programming 1997.

While the quality of the results of TeX's mathematical formula layout algorithm is convincing, its original description is hard to understand since it is presented as an imperative program with complex control flow and destructive manipulations of the data structures representing formulae. In this paper, we present a re-implementation of TeX's formula layout algorithm in the functional language SML, thereby providing a more readable description of the algorithm, extracted from the monolithical TeX system.

I've wanted a simple description of what TeX is doing for a long time, and it's nice to see that laid out in a clear and readable way.

Evaluating High-Level Distributed Language Constructs

I saw this mentioned by one of the authors on the erlang-questions mailing list:

The paper investigates the impact of high level distributed programming language constructs on the engineering of realistic software components. Based on reengineering two non-trivial telecoms components, we compare two high-level distributed functional languages, Erlang and GdH, with conventional distributed technologies C++/CORBA and C++/UDP.

GdH = Glasgow distributed Haskell.

HOPL III: Evolving a language in and for the real world: C++ 1991-2006

Yet another in the series of draft papers for HOPL-III. This one from Bjarne Stroustrup on Evolving a language in and for the real world: C++ 1991-2006. The paper starts the discussion at the point in time where his 1994 Design and Evolution book ended (which coincides with about the last time I used C++ on a professional basis - meaning I got some insight into what I missed out on for the last dozen years).

The talk outlines the history of the C++ programming language from the early days of its ISO standardization (1991), through the 1998 ISO standard, to the later stages of the C++0x revision of that standard (2007). The emphasis is on the ideals, constraints, programming techniques, and people that shaped the language, rather than the minutiae of language features. Among the major themes are the emergence of generic programming and the STL (the C++ standard library’s algorithms and containers).

Given the period of time covered, generics and the STL are the major highights of the paper. Lots of political discussion (the parts on Sun and Microsoft are mostly obvious, but it's amusing to see Bjarne speak out on the subjects). Much like the Design and Evolution book, this paper is worth a read by anyone interested in PL design, no matter their particular take on C++. Bjarne provides valuable insight on the forces that shape PLs, as well as providing constructive criticism.

Personally I found the discussion on C++0x the most interesting, as it reveals the issues that C++ is trying to overcome as well as the direction the language is headed. I've been tinkering with Boost of late, trying to figure out the FP facilities, but without much luck. Similar to C#3.0 and Java1.7, C++0x is proposing lambdas and a limited form of type inference for variables. But that's a minor addendum, as the paper makes clear that optional GC, concurrency and more thorough libraries are the major aspects to be addressed.